Systems and methods for improving visual attention models

ABSTRACT

Systems and methods for improving visual attention models use effectiveness assessment from an environment as feedback to improve visual attention models. The effectiveness assessment uses data indicative of a particular behavior, which is related to visual attention allocation, received from the environment to assess relative effectiveness of the environment on influencing the particular behavior.

BACKGROUND

A biological visual system is a capacity limited system in that it canonly process a relatively small number of objects at any given time.This is true, despite the fact that there are many objects that may bevisible at any given time. From the array of objects visible to aperson, that person's visual system will only attend to, or process, one(or very few) objects at any given time. In addition, people can attendto an object while looking at it, which is overt attention, and peoplecan attend to an object without looking at it, which is covertattention.

Understanding what attracts visual attention is a topic of research inpsychology, neuroscience and computer science. This research hasgenerated numerous studies directed toward understanding the behavior ofhuman visual attention, as well as many computational models of visualattention. These computational models (sometimes called visual attentionmodels, eye-gaze prediction models, attention models, or saliencymodels) attempt to simulate where, given visual stimuli (for example, apicture or a scene), a person will allocate his visual attention.

SUMMARY

Systems and methods for improving visual attentions models aredisclosed. A visual attention model improvement system, consistent withthe present invention, comprises a module for receiving visualrepresentations of at least a portion of two environments, a module forreceiving output generated by applying a visual attention model on thevisual representations of the at least a portion of the twoenvironments, an effectiveness assessment module, and a visual attentionmodel accuracy analysis module. The two environments are different fromeach other on a visual dimension. The effectiveness assessment moduleassesses relative effectiveness of the two environments on influencing aparticular human behavior based on data indicative of the particularhuman behavior received from the two environments, wherein theparticular human behavior is inferentially related to attentionallocation. The visual attention model accuracy analysis module comparesthe assessed relative effectiveness to the output generated by thevisual attention model.

In one embodiment, a visual attention model improvement system comprisesa module for receiving visual representation of at least a portion of anenvironment, a module for receiving output generated by applying avisual attention model on the visual representation of the at least aportion of the environment, an effectiveness assessment module, and avisual attention model accuracy analysis module. The effectivenessassessment module assesses relative effectiveness of the environment oninfluencing a particular human behavior based on data indicative of theparticular human behavior received from the environment, wherein theparticular human behavior is inferentially related to attentionallocation. The visual attention model accuracy analysis module comparesthe assessed relative effectiveness to the output generated by thevisual attention model.

In another embodiment, a system to modify a visual attention model for agroup of environments is disclosed. The system comprises a module forreceiving visual representations of at least a portion of one or moreenvironments in the group of environments; a module for receiving outputgenerated by applying the visual attention model on the visualrepresentations of the at least a portion of the one or moreenvironments; a module for assessing the relative effectiveness of theone or more environments on influencing the particular human behaviorbased on data indicative of a particular human behavior received fromthe one or more environments, wherein the particular human behavior isinferentially related to attention allocation; and a processing unit formodifying the visual attention model according to the comparison betweenthe assessed relative effectiveness and the output generated by thevisual attention model. Optionally, the processing unit associates themodified visual attention model with the group of environments andassociates the modified visual attention model with the particular humanbehavior.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are incorporated in and constitute a part ofthis specification and, together with the description, explain theadvantages and principles of the invention. In the drawings,

FIG. 1 is a block diagram of a visual attention model improvementsystem;

FIG. 2 is a functional flow diagram for an exemplary visual attentionmodel improvement system;

FIG. 3 is a functional module diagram of an exemplary visual attentionmodel improvement system;

FIG. 4 illustrates an exemplary flowchart of improving visual attentionmodels;

FIG. 5A and FIG. 5B show two exemplary environments differing on edge;

FIG. 6 illustrates an exemplary flowchart of a bottom-up visualattention model;

FIG. 7 is an artist's rendering of an environment;

FIG. 8A is an exemplary modeling output generated by a visual attentionmodel on the artist's rendering in FIG. 7;

FIG. 8B is an exemplary modeling output generated by a visual attentionmodel on the artist's rendering in FIG. 7;

FIG. 9 illustrates an exemplary flowchart of a hybrid visual attentionmodel;

FIG. 10 is an exemplary flowchart of a visual representation module;

FIG. 10A is another exemplary flowchart of a visual representationmodule;

FIG. 11A and FIG. 11B illustrate two exemplary environments differentfrom each other on luminance;

FIG. 12 shows an exemplary flowchart of a visual attention modelaccuracy analysis module;

FIG. 13A and FIG. 13B illustrate two exemplary environments in a digitalsignage network; and

FIG. 14A and FIG. 14B illustrate two exemplary digital environments.

DETAILED DESCRIPTION

While visual attention models have been studied for many years, morerecently they have been increasingly used for commercial purposes. Asthese uses have developed, questions remain as to the accuracy of themodels and ways to effectively improve them, as a visual attention modelis a simulation to visual attention that cannot perfectly model visualattention allocation of a viewer. When a biological visual systemreceives a visual input, a retinal image representation is formed.Generally speaking, visual attention operates as a two-stage process.Early-stage visual processing is based on low-level, non-volitionalfeatures, also referred to as bottom-up features, such as color,contrast, luminance, motion, orientation, and the like. Later,volitional features, also referred to as top-down features, such asspatial biases, prior-knowledge, tasks, expectations, goals, willinfluence the early-stage visual processing. The biological visualsystem processes the information combining both the bottom up visualfeatures and the top-down influences to allocate visual attention at aspecific object or region in the visual input.

A visual attention model (VAM) simulates neurological processes andpsychological effects in a biological visual system. First, visualrepresentation of a visual input, in a form consumable by a VAM,simulates a retinal image representation. Both bottom-up features andtop-down effects may be simulated in a visual attention model. However,errors are likely introduced into visual representation of the visualinput. For example, a human retinal may have different sensitivity toluminance from a digital camera. Additionally, the bottom-up featuresand the top-down effects simulated in the visual attention model aredifferent from neurological processes and psychological effects in anactual visual system. Therefore, three potential sources of error arepresent in a visual attention model: the visual representation,simulation of bottom-up features, and simulation of top-down effects.

Improving modeling accuracy of visual attention models is important andnecessary, to further the technology development in this area. Peoplehave conducted eye-tracking studies to record human fixation patterns tocompare with patterns simulated by visual attention models. Eye-trackingstudies measure a point of gaze, referred to as visual fixation, whichis directly related to overt visual attention allocation. In otherwords, eye-tracking studies measures visual fixation, which is thelocation where a viewer is looking at while the viewer is in anenvironment. However, eye-tracking studies are difficult, invasive, andexpensive. Indeed, eye-tracking studies are economically impractical inmany situations, for example, such as a study in a nationwide digitalsignage system including signage displays allocated in hundreds ofstores.

The present disclosure is directed to a system and method, for improvingmodeling accuracy of visual attention models, using data that isindirectly correlated to the accuracy of a model. This approach isreferred to as an effectiveness assessment approach. When a viewer is inan environment, the viewer's visual attention to the environment mayinfluence the viewer's decision. The viewer's behavior may reflect theviewer's decision. In some cases, the effectiveness assessmentapproaches may use data indicative of viewers' behavior, which isinferentially related to viewers' visual attention, to provide feedbackto visual attention models. In some other cases, the effectivenessassessment approaches may use data indicative of viewers' behavior alongwith visual fixation measurement data to provide feedback to visualattention models.

In one embodiment, visual representations of at least a portion of twoenvironments that differ from each other on a visual dimension areobtained. The environments are expected to influence a particular humanbehavior that is inferentially related to attention allocation in theenvironment and about which data can be collected. An output of a visualattention model for these two representations may be generated. From thecollected data, the relative effectiveness of the two environments oninfluencing the particular human behavior can be assessed and used tomodify the visual attention model if the assessment is inconsistent withthe output generated by the visual attention model.

Visual dimensions are features that may be represented in a visualattention model. For example, visual dimensions may be bottom-upfeatures, which are the particulars of a visual input, such as color,edges, luminance, faces, intensity, font, orientation, motions, distancefrom fovea, etc. As another example, visual dimensions may be top-down,volitional effects, such as spatial biases, prior-knowledge influences,task-based influences, and the like.

To better understand this disclosure, FIG. 1 illustrates an embodimentof a visual attention model improvement system 105. In this system, anenvironment 100 represents an area where a viewer is located, will belocated and/or where a viewer may attend to an object of interest. Anenvironment 100 may be any number of different areas, for example, anatural environment, such as a retail store, an outdoor scene, or abuilding and the like, or a digital environment that is created by acomputer or a group of computers, such as a webpage, a video game, andthe like.

The environment 100 may influence a particular behavior of visitors. Forexample, an exhibition in a museum may be designed to showcase asculpture, so a particular behavior may be visitors' viewing thesculpture, or being close to the sculpture. In a store environmentdesigned to promote sales of products, for instance, a particularbehavior may be purchasing the products, or picking up the products.Another exemplary particular behavior is to follow a directional sign ina building, such as an ‘exit’ sign. In an environment of a webpagedesigned to highlight a product, the particular behavior may be openinga pop-up window containing product detailed description, viewingenlarged images, or adding the product to a virtual shopping-cart, forexample.

Visual attention allocation in an environment is indirectly related tohow effectively the environment influences a particular behavior. Anenvironment may have a visual stimulus to be attended and a number ofvisual stimuli that are distracting. A particular behavior may berelated to how likely an object of interest in an environment isattended. For example, when a piece of advertisement content is salientin an environment, viewers likely attend to the advertisement andpossibly act on the advertisement, such as, purchasing an item that isadvertised. As another example, when a directional sign is salient in anindoor environment, visitors likely see the sign and follow itsdirection. Therefore, the effectiveness of the environment oninfluencing viewers' particular behavior may indicate the attentionallocation in an environment, for example, whether an object of interestis likely attended.

Visual attention model 110 may be used to evaluate whether anenvironment is properly designed to effectively influence a specifiedbehavior according to its simulation of visual attention or saliency.Visual attention models (VAM) simulate the extent to which objects orregions in an environment differ with respect to the likelihood thatthey will attract visual attention. A VAM takes input of a visualrepresentation of at least a portion of an environment, which is anyform of input that is amenable to evaluation by a visual attentionmodel, and may, for instance, be an image, a digital photograph, avirtual 3D scene, a webpage, a document, a set of visual parameters, ora video.

A visual attention model 110 generates an output indicating regions,objects, or elements that may receive visual attention in theenvironment. A visual attention model's disposition of an object refersto how a model of visual attention characterizes the relative saliencyof an object within an environment. For example, some visual attentionmodels will superimpose a trace line around predicted objects. Thedisposition of an object may be considered as “selected” (when traced)or “not selected” by the model. Other visual attention models willgenerate heat maps that may be superimposed over the image or viewedseparately from the image. The disposition of an object is the degree towhich the model has selected the object (or not selected the object).Some visual attention models may generate and assign a value to aparticular object and/or region, referred to as a saliency number, withthe value representing that object's saliency in relative terms. In thecontext of a saliency number, the disposition of the object may be thesaliency number itself.

Some visual attention models may generate and assign a value to anobject or region, referred to as a sequence number, representing theorder that a viewer will attend to the object or region compared withother objects or regions. The disposition of the object may be thesequence number. Some visual attention models may generate and assign avalue to an object or region, referred to as a probability number,representing the probability that an object or region will be attendedin a given time period. The disposition of the object may be theprobability number. An object has relative high saliency in anenvironment, according to a VAM output, when it is selected, or has ahigh saliency number, a high probability number, or a low sequencenumber compared with other objects in the environment.

In FIG. 1, an effectiveness assessment module 120 may assess howeffectively the environment influences a particular behavior based ondata indicative of then particular behavior received from anenvironment. This assessment may then be used to evaluate and improvevisual attention models. In this context, the particular behavior may beinferentially related to attention allocation, but not directly relatedto attention allocation. That is, visual attention can be inferred fromthe assessed effectiveness. In one embodiment, an environment withhigher effectiveness indicates that an object of interest is more likelyto be attended in the environment than an environment with lowereffectiveness.

In some cases, the particular behavior may be directly inferentiallyrelated to attention allocation, which means that the occurrence of theparticular behavior indicates where visual attention is allocated. Forexample, a particular behavior of mouse-click indicates that visualattention is allocated at the position of the mouse, such as a button ona webpage, a link on a document, and the like. As an example, aparticular behavior may be a mouse-click on a button on a webpage, andthe effectiveness assessment module 120 may collect data on the numberof clicks on the button. As another example, a particular behavior of auser touching on a touch panel indicates that visual attention isallocated at the position of the touch.

In some other cases, the particular behavior is indirectly inferentiallyrelated to attention allocation, which means that the occurrence of theparticular behavior indicates where visual attention is likelyallocated. In a particular embodiment, occurrence of a particularbehavior indicates that an object of interest in an environment islikely attended to. However, a viewer may demonstrate the particularbehavior even if the viewer does not attend to the object of interest.For example, a digital signage display presents an advertisement on anitem in a store. People looking at the display may be influenced andpurchase the advertised item, while some people may purchase the itemwithout seeing the advertisement. For a particular behavior of viewers'purchasing a product, for instance, the effectiveness assessment module120 may collect point-of-sale data. A product may be, for example, aparticular item, items of a model of product, or items of a type ofproduct. For a particular behavior of viewers' awareness of a productdescribed on a webpage, the effectiveness assessment module 120 maycollect surveys filled out by customers. As another example, for aparticular behavior of people following a directional sign, theeffectiveness assessment module 120 may use video data to analyze amountof traffic following the sign's direction. In some cases, theeffectiveness assessment module 120 may collect data in a naturalenvironment that is related to an environment expected to influence aparticular behavior. For example, the effectiveness assessment module120 may collect data in a store while a sign to promote sales of aproduct is outside the store.

In some embodiments, the effectiveness assessment module 120 may collectdata that is indicative of the particular behavior but not a measurementof the particular behavior. In an exemplary embodiment, for a particularbehavior as viewers' intention to purchase a product, the effectivenessassessment module 120 may collect data utilizing sensor technology onthe number of people who are close to the product or the number of timesthat the product is picked up. In some cases, for a particular behaviorof viewers' awareness of a product described on a webpage, theeffectiveness assessment module 120 may gather data on the length oftime people visit the webpage, the amount of purchases originated fromthis webpage, or the number of visits to a webpage with furtherinformation on the product directed from the webpage. In an exemplaryembodiment, for a particular behavior of people following an ‘exit’sign, the effectiveness assessment module 120 may collect data on thelength of time for people to exit a building in a fire drill. In oneother exemplary embodiment, for a particular behavior of people's usinga service advertised on a TV program, the effectiveness assessmentmodule 120 may gather data on the number of customers calling the phonenumber shown in the TV program, or the volume of service requested.

In one particular embodiment, the effectiveness assessment module 120may collect data from two comparable environments, which are differentfrom each other on an aspect of interest, and determine relativeeffectiveness of the two environments. Thus, the relative effectivenessmay indicate the impact of the aspect of interest while impacts of otheraspects of the environments are negated.

The effectiveness assessment approach of the present disclosure isadvantageous over eye-tracking studies in many aspects. First, theeffectiveness assessment approach may gather data readily available, forexample, such as point-of-sale data, survey data, or the amount of timeviewers spent on a webpage. Second, the effectiveness assessmentapproach may be conducted in situations that are impractical foreye-tracking studies. For example, for an outdoor environment having abillboard with a restaurant advertisement erected by a highway, it isimpossible to gather eye-tracking data while people drive by thebillboard. However, the effectiveness assessment approach may collectand analyze data on the number of visits to the restaurant while thebillboard is displaying, or while one version of a billboard is beingdisplayed as compared to another. Third, the effectiveness assessmentapproach may gather data indicating viewers' overt attention whereviewers attend to an object of interest via looking at it and covertattention where viewers attend to the object without looking at it,while eye-tracking studies cannot gather data on covert attention.

FIG. 2 is a functional flow diagram for an exemplary visual attentionmodel improvement system 190. An environment 210 may be an instance orsample of an actual environment, which is either a natural environmentor a digital environment, or a class of physical environments. Anenvironment may be, for example, a webpage, a banner within a webpage, atraffic scene, a photograph, a virtual 3D scene, a store having adigital signage display, a lobby with a sign, a video of a thirty-secondcommercials, an instrumentation panel with buttons, a webpage with apiece of dynamic advertisement, etc. Alternatively, an environment maybe a design of a display environment that does not physically exist. Asanother example, two environments could be two instances of the samephysical environment at different times of day, for example, anenvironment in the afternoon and one at night.

A visual representation 220 of at least a portion of the environment 210is used as an input to a visual attention model. The visualrepresentation 220 may be, for example, a set of visual parameters, animage, a set of images that each image represents an aspect of theenvironment, a digital photograph, a virtual 3D scene, a webpage, adocument, a video, and the like.

Visual attention model (VAM) 230 generates a modeling output for thevisual representation 220. The modeling output may be represented, forexample, by a saliency map, trace lines, saliency numbers, sequencenumbers, probability numbers, and the like.

The environment 210 may influence a specified behavior, for example,such as following a sign, making a purchase, or visiting an exhibition.Data indicative of the specified behavior will be collected from theenvironment 210 (block 240). Here, data indicative of the specifiedbehavior may be, for instance, point-of-sale data, number of visits,survey data, motion sensor data, or video data. The effectiveness of theenvironment 210 will be assessed based on the collected data (block250). The assessed effectiveness will be compared with the modelingoutput and provide feedback to visual attention model 230. The visualattention model 230 may be modified based on the feedback. For example,if a relative saliency for a target element is higher than distractiveelements in an environment, while viewers attending to the targetelement are likely to be influenced on a specified behavior, theeffectiveness of the environment shall be high. However, based oncollected data indicative of the specified behavior, the assessedeffectiveness is low such that it is inconsistent with the VAM output.The visual attention model may be modified such that its modeling outputis consistent with the assessed effectiveness.

In a simplified embodiment, two environments differ in a visualdimension represented in a visual attention model. The two environmentsmay influence a particular human behavior. The particular human behaviormay be related to whether a person attends to an object of interest inthe environments. Visual representations of at least a portion of thetwo environments may be generated or selected from a library of visualrepresentations. Modeling outputs for the visual representations may begenerated by the VAM. Data indicative of the particular human behaviormay be gathered from each of the two environments. The data may beobtained from direct measurement of the particular human behavior,measurement data indicative of the particular behavior, or a datarepository containing data indicative of the particular behavior. Then,the effectiveness of the environment on influencing the particular humanbehavior is assessed based on the gathered data. The relativeeffectiveness of the two environments is compared with the modelingoutputs. The VAM may be modified based on the comparison.

In an exemplary embodiment, saliency numbers for objects or regions inthe environment may be generated by the VAM. The relative effectivenessmay be compared with the saliency number of the same object of interestin the two environments, environment A and environment B, for example.If the saliency number of the object in environment A is close to thesaliency number of the display object in the environment B, but theassessed effectiveness of environment A is higher than the assessedeffectiveness of environment B, the relative effectiveness of theenvironments is inconsistent with the VAM output. The visual attentionmodel may be modified to generate modeling output such that the saliencynumber of the object in environment A is higher than the saliency numberin environment B.

In one embodiment, if the two environments differ from each other on onevisual dimension but are similar on other visual dimensions, the visualattention model may be modified on an aspect related to the visualdimension on which the two environments differs. For example, oneenvironment is a webpage of green background with a banner of a lakeresort, and the other environment is a webpage of orange background withthe same banner and similar to the webpage of green background in othervisual dimensions and contents. In this example, the visualrepresentations of the two environments may be the two webpages. Bothwebpages are designed to attract viewers to go to a lake resort website,and the number of visits to the lake resort website directed from thewebpage is recorded. The banner has a similar saliency number on bothwebpages, for example, based on the analysis of the visual attentionmodel. However, the recorded number of visits to the website indicatesthat more visits are directed from the webpage with orange background. Aparameter related to the background color in the visual attention modelmay be modified to generate output indicating the banner has highersaliency in the webpage with orange background.

Visual Attention Model Improvement System

FIG. 3 is a functional module diagram of an exemplary visual attentionmodel (VAM) improvement system 300, which is a system for improving thevisual attention model. Visual attention model improvement system is, inthe embodiment shown in FIG. 3, within computer system 310. Computersystem 310 may be any general purpose or application-specific computeror device. It may be a stand-alone computer, or a plurality of networkedcomputers or devices. Further, computer system 310 may include, forinstance, a handheld computer, digital camera, or a tablet PC, or even acellular telephone. Computer system 310, in one embodiment, has variousfunctional modules (not shown in FIG. 3) that comprise an operatingsystem. Such an operating system facilitates the visual attention modelimprovement system's access to the computer system's resources. Computersystem 310 may have a processor and memory, and various traditionalinput/output interfaces.

In one embodiment, visual representation module 320 may generate adesired number of visual representations of a portion of environments,which may be designed to achieve a visual goal or may have an impact ona specified behavior. For example, visual representation of at least aportion of an environment may be one or more photographs taken from theenvironment or a video recorded from the environment. In anotherembodiment, visual representation module 320 may select a desired numberof visual representations of a portion of environments from a datarepository storing visual representations of environments. In oneparticular embodiment, visual representation module 320 may select orgenerate environments that may differ on a visual dimension. Forexample, visual representation module 320 may select pictures of twostore settings, where each store setting has a digital signage displayand the display content in one store is different from the displaycontent in another store. The visual representation module 320 isdiscussed further below.

Visual attention model (VAM) module 330 is any embodiment of any visualattention model or combination of models. The VAM module 330 takes theinput of a visual representation of at least a portion of an environmentand generates a modeling output. Visual attention model module 330 isshown in FIG. 3 as part of visual attention model improvement system300, but visual attention model module 330 in another embodimentoperates as a stand-alone computer process or even as a service providedover any type of computer network (such as the World Wide Web) at aremote computer. Visual attention model module 330 will be discussedfurther below.

An optional data collection module 340 may be included in a VAMimprovement system. Data collection module 340 collects data indicativeof a specified behavior from an environment. In some cases, thecollected data may be selected from a data repository, or recorded fromthe environments. In an environment of a store having a digital signagedisplay in which the environment, for example, the collected data may bepoint-of-sale data of a product when the digital signage display ispresenting an advertisement of the product. For an exhibition layoutdesigned to promote viewing of an item, for example, the collected datamay be sensor data on the number of people visiting the item, or thenumber of people passing an area close to the item. As another example,to evaluate a behavior of following a sign or a group of signs, thecollected data may be the amount of time for viewers moving from astarting location to a designated location. Alternatively, to evaluate abehavior of following a traffic sign or a group of traffic signs, whichare designed to reduce accidents in a highly congested area, thecollected data may be the number of accidents. In addition, to evaluatepeople attending to a piece of content on a webpage, the collected datamay be the number of visits to a website directed from the piece ofcontent on the webpage.

Data collection module 340 may collect data in accordance withexperiment design principle. On a digital signage network, datacollection module 340 may utilize techniques described in detail in U.S.Patent Application Publication No. 2010/0017288, entitled “Systems andMethods for Designing Experiments,” U.S. Patent Application PublicationNo. 2009/0012848, entitled “System and Method for Generating Time-slotSamples to Which Content May be Assigned for Measuring Effects of theAssigned Content,” and U.S. Patent Application Publication No.2009/0012927, entitled “System and Method for Assigning Pieces ofContent to Time-slots Samples for Measuring Effects of the AssignedContent,” which are incorporated herein by reference in entirety.

Based on data gathered in data collection module 340, effectivenessassessment module 350 determines relative effectiveness of anenvironment on influencing the specified behavior. Given twoenvironments (environment A and environment B), for example, environmentA and environment B may be the same museum show room but the location ofa displayed item is changed. The effectiveness of environment A ishigher than the effectiveness of environment B, for instance, if thenumber of people visiting the displayed item at environment A per unitduration is more than the number of people visiting the displayed itemat environment B per unit duration. As another example, environment Aand environment B may be two of a same chained stores, each having adigital signage display playing an advertisement on a designer bag,while the advertisements on the two signage displays differ in the imagesize of the designer bag. The effectiveness of environment A is higherthan the effectiveness of environment B, for instance, if the amount ofsales of the advertised item from environment A is higher than theamount of sales from environment B. As another example, theeffectiveness of environment A is higher than the effectiveness ofenvironment B if the amount of time to finish a transaction online inenvironment A is shorter than the amount of time at environment B. Here,environment A and environment B may be two webpages for peopleperforming a same transaction.

In some cases, the effectiveness assessment module 350 may quantify therelative effectiveness. For example, the relative effectiveness of twoenvironments influencing a specified behavior may be the ratio of numberof visits from two environments, the ratio of amount of sales from twoenvironments, the reverse ratio of the length of time from twoenvironments, and so on. In some cases, the effectiveness of anenvironment on influencing more than one particular behavior may beevaluated by the effectiveness measurement module 350. For example, theeffectiveness measurement module 350 may use data indicative of users'purchasing a product in addition to data indicative users' being closeto the product.

VAM accuracy analysis module 360 may compare relative effectivenessgenerated by the effectiveness assessment module 350 and modeling outputgenerated by the visual attention model module 340. In an exemplaryembodiment, the modeling output is a saliency number of an object ofinterest in an environment. In some cases, the modeling output isconsistent with the relative effectiveness of the environments if theratio of the saliency number of the object of interest in environment Aand environment B is equivalent to the ratio of the relativeeffectiveness of environment A and environment B, such that

$\frac{{Saliency}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)}{{Saliency}\mspace{14mu} \left( {{environment}\mspace{14mu} B} \right)} = \frac{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)}{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} B} \right)}$

If the ratio of saliency is different from the ratio of effectiveness,the VAM may be modified.

In some other cases, the relative effectiveness of a placebo environment(environment P), which is an environment with no influence on thespecified behavior, is also determined by the effectiveness assessmentmodule 350. An accuracy indicator may be determined by the followingequation,

${{Accuracy}\mspace{14mu} {Indicator}} = \frac{\frac{\begin{matrix}{{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)} -} \\{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} P} \right)}\end{matrix}}{\begin{matrix}{{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} B} \right)} -} \\{{Effectiveness}\mspace{14mu} \left( {{environment}\mspace{14mu} P} \right)}\end{matrix}}}{{Saliency}\mspace{14mu} {\left( {{environment}\mspace{14mu} A} \right)/{Saliency}}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)}$

The VAM accuracy analysis module 360 is described further below.

VAM modification module 370 is the visual attention module modificationmodule. VAM modification module 370 modifies aspects of the visualattention module's parameters or architecture. This modification may beaccomplished in many ways depending on the implementation of the visualattention model module 340. For example, visual attention model module340 may itself support function calls that modify aspects of how thevisual attention module works. In one embodiment, visual attention modelmodule 340 may support a function call that modifies a weight factorrelated to certain visual dimension represented in a VAM (luminance, forexample). In another embodiment, if the visual attention model module isinvoked via command line, various switches could be employed to changevariables that are within the visual attention model module.Alternatively, if the visual attention model module 340 is embodied in ascript or programming code, the VAM modification module could modify thescript or programming code itself. In another embodiment, the entirevisual attention model may be replaced by another visual attentionmodel. The particular ways in which the VAM modification module 370modifies the visual attention model module 340's underlying visualattention model (or the application of such a model to a scene) isdiscussed further below. In some embodiments, VAM accuracy analysismodule 360 and VAM modification module 370 may operate on a separateserver or as a service provided over a computer network.

Data repository 380 handles data storage needs of the visual attentionmodel improvement system 300. In some cases, visual representations ofat least a portion of environments may be stored in the data repository380. Data indicative of a specified behavior may also be stored in thedata repository 380. Among other things, effectiveness assessment foreach environment may be stored in the data repository 380. In oneembodiment, each group of environments may have its own set of parameterfor a visual attention model. In another embodiment, each environmentmay have its own set of parameter for a visual attention model. In theseembodiments, data repository 380 may store the parameter sets.

Data repository 380 may be any computer memory. For example, it may berandom access memory, a flat file, a XML file, or one or more databasemanagement systems (DBMS) executing on one or more database servers or adata center. A database management system may be a relational (RDBMS),hierarchical (HDBMS), multidimensional (MDBMS), object oriented (ODBMSor OODBMS) or object relational (ORDBMS) database management system, andthe like. Data repository 380, for example, may be a single relationaldatabase such as SQL Server from Microsoft Corporation. In some cases,data repository 380 may be a plurality of databases that may exchangeand aggregate data by data integration process or software application.In an exemplary embodiment, part of the data repository 380 may behosted in a data center.

FIG. 4 illustrates an exemplary flowchart of improving visual attentionmodels. Initially, visual representations of at least a portion of twoenvironments are received (step 410). In some cases, visualrepresentations of more than two environments may be used. In anotherembodiment, a placebo, which is visual representation of an environmentwith no impact on the particular behavior, is received. Next, dataindicative to a particular behavior that is related to visual attentionallocation is collected from the two environments (step 420). Thecollected data are used to assess relative effectiveness of the twoenvironments on influencing the particular behavior (step 430). Therelative effectiveness of the two environments will be compared with themodeling output generated by applying the VAM on the visualrepresentations (step 440). If the modeling output is not consistentwith the assessed relative effectiveness, the model may be modified(step 450).

For example, visual representations of two environments, environment Aand environment B, as shown in FIG. 5A and FIG. 5B, are simplified storeenvironments having digital signage displays each playing a burgeradvertisement. These two environments are similar on all visualdimensions such as color, luminance, and faces, except on edges (theborder around the display content). The modeling output generated by aVAM is saliency numbers. The saliency number of the burger advertisementdisplay in both environments is similar. That is, according to thevisual attention model, the difference on edges does not affect therelative saliency of the content. Point-of-sale data of burger in thetwo environments is gathered to determine how effectively the twoenvironments influence people on buying burgers. The point-of-sale datacollected from environment A, however, is higher than the point-of-saledata collected from environment B. Therefore, the modeling output isinconsistent with relative effectiveness of the two environmentsindicated by the point-of-sale data. The visual attention model may bemodified accordingly.

Visual Attention Models

One basic methodology of visual attention models is represented in FIG.6, which is proposed by Itti, L. & Koch, C. (2000), A saliency-basedsearch mechanism for overt and covert shifts of visual attention, VisionResearch, vol. 40, pages 1489-1506. At a high level, FIG. 6 shows how togenerate modeling output of visual attention by assessment of bottom-upfeatures, or referred as low-level features, such as color, motion,luminance, edges, etc. which serve as building blocks of the visualrepresentations mediating some aspects of human vision. First, a visualrepresentation of a scene, for example, a digital photograph, isprovided to a computer-implemented version of the Itti and Koch model(step 610). Next, a feature extraction process analyzes the digitalphotograph for colors, intensity, orientations, or other scene cues,such as motion, junctions, terminators, stereo disparity, and shape fromshading (step 620). The feature extraction process yields a plurality offeature maps (step 630), which are combined to produce a saliency map(step 640). In the case of the Itti and Koch model, saliency numbers ofregions and/or objects in the scene are computed based on normalizedfeature maps. The saliency data may be provided to a user as a renderingof the original digital photograph with the “brightest” objects being towhich the model has predicted visual attention will be next allocated.The saliency numbers is the output of the visual attention model (step650).

Itti and Koch's model is representative of a bottom-up visual attentionmodel, in that it makes its predictions based on analysis of theparticulars of the scene. Other bottom-up visual salience models aredescribed in these references: D. Gao, V. Mahadevan and N. Vasconcelos(2008), On the plausibility of the discriminant center-surroundhypothesis for visual saliency, Journal of Vision, 8(7):13, 1-18.

FIG. 7 is an artist's rendering of a scene 201 that could be provided toa visual attention model such as Itti and Koch. It is a simplified sceneincluded here for illustrative purpose only; in practice the scenes areoften actual digital photographs, or videos, and are much more complex.FIG. 7 includes a number of objects within the scene, such as the star202, flower 203, face 204, star 205, arrow 206, and cup 207.

FIG. 8A is an exemplary modeling output generated by a visual attentionmodel on the artist's rendering in FIG. 7. The highlighted (and in thisillustration, encircled) objects are those that the model predicts to bevisually salient. For example, star 202 is in this figure withinhighlight border 208; flower 203 is within border 209; face 204 iswithin border 221; star 205 is within border 211; arrow 206 is withinborder 212; and cup 207 is within border 213. Thus the model in thisinstance has determined six objects that are, relative to other objects,more visually salient. This particular model also predicts how attentionwill move among the objects determined to be above some visual saliencythreshold. For example, visual attention pathway 301, 302, 303, 304, and305 show a predicted visual attention pathway.

FIG. 8B is another exemplary modeling output generated by a visualattention model on the artist's rendering in FIG. 7. In addition to whatis shown in FIG. 8A, FIG. 8B includes the sequence of predicted visualattention. For example, star 202 is labeled “1” (attention sequencenumber 214), and flower 203 is labeled “2” (attention sequence number215) and so forth.

Besides bottom-up models, there is another class of models referred toas top-down models of visual attention. In contrast to bottom-up models,these models influence the attention allocation with spatial bias (forexample, ‘F’ pattern bias for webpage and center bias for display), anexplicit task (for example, avoiding obstacles and collecting objects),or prior knowledge of the world that will influence where attention willbe allocated during a specific search task (for example, chairs tend tobe on the floor and not on the ceiling). This knowledge (both task-basedand prior-knowledge) is used in conjunction with the bottom-up featuresto direct attention to objects within the observed scene. Some exemplarytop-down models are described in Rothkopf, C. A., Ballard, D. H. &Hayhoe, M. M. (2007), Task and context Determine Where You Look, Journalof Vision 7(14):16, 1-20; and also in Torralba, A., ContextualModulation of Target Saliency, Adv. in Neural Information ProcessingSystems 14 (NIPS), (2001) MIT Press, 2001. For example, Torralba's modelof visual attention has prior knowledge about the features that comprisea particular type of object and information about the absolute andrelative locations of these objects within the scene. This priorknowledge provides “top-down” influences on searching for specifictargets within a scene.

The art has evolved to include hybrid visual attention models that havefeatures of both bottom-up and top-down design, and adapted fordifferences in the types of visual representations the models will beexposed to (for example video versus still images, outdoor images versusweb pages, and so forth).

An example of hybrid visual attention model is described in byNavalpakkam, V. & Itti, L. (2005), Modeling the Influence of Task onAttention, Vision Research, vol. 45, pages 205-231. This model receivestask definition, determines task-relevant entities, and predicts visualattention allocation by biasing the attention system with taskrelevance.

FIG. 9 is an exemplary flowchart of a hybrid visual attention modelillustrating the functional steps. First, a visual representation isprovided to a hybrid visual attention model (step 910). Next, a featureextraction process analyzes the visual representations for colors,intensity, orientations, or other scene cues, such as motion, junctions,terminators, stereo disparity, and shape from shading (step 920). Thefeature extraction process yields a plurality of feature maps (step930). The feature maps are modified with top-down influences, such astask-relevance, spatial bias, prior-knowledge influence (step 940).Then, the feature maps are combined to produce a saliency map (step950). Saliency numbers of regions and/or objects in the scene arecomputed based on normalized feature maps. The saliency numbers is theoutput of the hybrid visual attention model (step 960).

Visual Representation Module

Visual representation module generates or selects visual representationsof at least a portion of one or more environments as input to a visualattention model. FIG. 10 is an exemplary flowchart of visualrepresentation module. Initially, the module identifies a visualdimension represented in a visual attention model for evaluating theaccuracy of the model on the dimension, such as brightness (step 1010).Next, one or more environments that are different on the visualdimension are selected (step 1020). Visual representations for the oneor more environments are generated (step 1030). FIG. 11A and FIG. 11Billustrate two simplified environments that are different from eachother on luminance, for example, two fast-food restaurants. Theluminance of environment 1120 is higher than the luminance of theenvironment 1110.

In one embodiment, a visual representation module may insert visualrepresentations of one or more objects into a same or similar backgroundto generate visual representations of two environments. Here, abackground may be a picture of a store, a computer game background, or awebpage. FIG. 10A illustrates another exemplary flowchart of visualrepresentation module. First, a visual dimension is identified (step1010A). Next, one or more objects that differ on the visual dimensionare selected or designed, such as a red apple and a green apple (step1020A). Visual representations of the one or more objects are insertedon similar background (step 1030A). Visual representations of one ormore environments are generated. FIG. 5A and FIG. 5B illustrates twopieces of display objects different from each other on edges, which aredisplayed in similar background. FIG. 5A and FIG. 5B may be generated byinserting visual representations of the display objects in an image ofthe background. In some cases, similar background may be stores of samefranchise chain that are similar on floor layouts and decorations.

In some cases, visual representations of the one or more environmentsmay be selected from a data repository based on the same criteriaillustrated above, for example, one or more environments differing on avisual dimension, or similar background inserted with one or moreobjects differing on a visual dimension.

VAM Accuracy Analysis Module

VAM accuracy analysis module analyzes the modeling accuracy of a VAMbased on feedback provided by the effective assessment module. FIG. 12shows an exemplary flowchart of visual attention model (VAM) accuracyanalysis module. First, receive relative effectiveness of one or moreenvironments influencing a specified behavior (step 1210). Receivemodeling outputs generated by applying a visual attention model to thevisual representations of the one or more environments (step 1220). Insome embodiments, the modeling outputs may be generated and stored in adata repository, and the VAM accuracy analysis module selects themodeling outputs from the data repository. The output of the VAM iscompared with the relative effectiveness (step 1230). An accuracyindictor of VAM is determined based on the comparison (step 1240).

In one particular embodiment, a specified behavior is related to thelikelihood an object or a region in the environment being attended. Themodeling output for an object generated by a VAM may be quantified as anumber, referred as relative saliency, for example, such as a saliencynumber that represents a relative saliency of an object or a regioncompared with others objects or regions, a sequence number that is theorder in which a viewer will attend to an object or region compared withother objects or regions, or a probability number that is theprobability of an object being attended within a given time, and thelike. In an exemplary embodiment, a relative saliency of the object ineach of the two environments is generated by the VAM. The ratio of therelative effectiveness of the two environments is compared with theratio of relative saliency number of the object in the two environments,and an accuracy indicator is determined based on the comparison. Forexample, the accuracy indicator may be computed by an equation as:

${{Accuracy}\mspace{14mu} {Indicator}} = \frac{{Effectiveness}\mspace{14mu} {\left( {{environment}\mspace{14mu} A} \right)/{Effectiveness}}\mspace{14mu} \left( {{environment}\mspace{14mu} B} \right)}{{Saliency}\mspace{14mu} {\left( {{environment}\mspace{14mu} A} \right)/{Saliency}}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)}$

where Saliency( ) is the relative saliency of the object, Effectiveness() is the relative effectiveness of an environment, and AccuracyIndicatoris a computed indication of accuracy.

In some cases, the accuracy analysis may be based on data of assessedeffectiveness from several sets of environments and modeling outputsgenerated by the VAM on those environments. Table 1 is an illustrationaccuracy indicator of six sets of environments that differ on edge,using the accuracy indicator equation above. The accuracy indicator ofedge may be an average value of the six test results.

TABLE 1 Test Test Test Test Test Test 1 2 3 4 5 6 Average Accuracy 0.7980.934 0.743 0.702 0.894 0.632 0.7838 Indicator

In some cases, an accuracy indicator may be an accuracy indicatorvector, which may be generated based a set of experiments that eachexperiment is related to one or more visual dimensions of interest. Forexample, 3 sets of experiments are conducted: experiment) is related toluminance and edge, experiment 2 is related to motion, experiment 3 isrelated to color. The experiment results are illustrated in Table 2. Anaccuracy indicator vector [2.3 1.4 0.8 0.6] may be generated.

TABLE 2 Exp 1 Ex 2 Ex 3 Luminance 2.3 Edge 1.4 Motion 0.8 Color 0.6

In some embodiments, the VAM accuracy analysis module may use nonlinearalgorithm to establish the correlation between the relativeeffectiveness of the environment and the relative saliency of the objectof interest. In some cases, depending on the particular behavior beingselected, the VAM accuracy analysis module may utilize a mathematicalalgorithm that fits to the relationship between the VAM output and theeffectiveness assessment. As a simple example, the accuracy indicatormay be computed use an equation below,

${{Accuracy}\mspace{14mu} {Indicator}} = \frac{{Effectiveness}\mspace{14mu} {\left( {{environment}\mspace{14mu} A} \right)^{2}/{Effectiveness}}\mspace{14mu} \left( {{environment}\mspace{14mu} B} \right)^{2}}{{Saliency}\mspace{14mu} {\left( {{environment}\mspace{14mu} A} \right)/{Saliency}}\mspace{14mu} \left( {{environment}\mspace{14mu} A} \right)}$

VAM Modification Module

VAM modification module modifies the VAM based on the result of the VAMaccuracy analysis module. In an exemplary embodiment, the VAMmodification module may add or change a weight factor related to avisual dimension represented in a VAM. In some cases, visual attentionmodels may generate a saliency map by combining visual feature maps, asillustrated in flowcharts in FIG. 6 and FIG. 9. For example, threefeature maps are generated on an input of visual representation of anenvironment: a first map sensitive to color, a second one to edge and athird one associated with luminance. A saliency map is generated basedon a weighted combination of these three maps. Typically, these threemaps have equal weights into the saliency map indicating that there isno bias for one type of feature over another and may be represented as aweighting vector (for example, [1 1 1] for an equal weight of the threefeature maps). In an exemplary embodiment, the VAM modification modulemay modify these weights simulating a viewer that might be biased towardone feature over another. This may be accomplished through a functioncall that would modify these values based upon a distribution ofacceptable values.

In some cases, the VAM modification module may use an accuracy indicatorgenerated by the accuracy analysis module to modify a weight factor of afeature map related to a visual dimension. For example, the originalweighting vector of a visual attention model is [1 1 1] corresponding tocolor, edge, and luminance. The VAM accuracy analysis module generatesan accuracy indicator vector as [2 0.5 0.5]. The VAM modification modulemay modify the weighting vector of the VAM to [2 0.5 0.5] for thefeature maps in color, edge, and luminance.

Finding parameters that minimize error between a function and the dataare well known in the literature. Some of these approaches includeoptimization methods such as linear interpolation, genetic algorithmsand simulated annealing. These methods may be used to identify theparameters that minimize the difference between the modeling output andthe effectiveness assessment.

In some cases, a visual dimension, on which environments differing fromeach other, may not be represented in a visual attention model. If theoutput generated by the visual attention model is inconsistent with therelative effectiveness of the two environments, a parameter related tothe visual dimension may be added to the VAM. For example, a visualattention model generating features maps sensitive to color, edge, andluminance, in the example given above, may add a feature map sensitiveto orientation in generating the saliency map.

In some cases, a parameters set of a VAM may be stored in a datarepository. For example, visual dimensions represented in a VAM andweighting factors associated with the visual dimensions may be stored ina data repository. In some other cases, a particular VAM may beassociated with a particular human behavior in a data repository. In anexemplary embodiment, after a VAM is modified according to an accuracyanalysis based on relative effectiveness of an environment oninfluencing a particular human behavior, the parameter set of the VAMmay be stored in a data repository, and the VAM may be associated withthe environment and the particular human behavior.

In one embodiment, environments differ on more than one visual dimensionmay be selected or designed. Visual representations of at least aportion of the environments may be generated or selected from a datarepository as input to a visual attention model (VAM). Data indicativeto a particular human behavior may be received from the environments,wherein the particular human behavior may be related to attentionallocation. Relative effectiveness of the environments on influencingthe particular human behavior may be assessed based on the receiveddata. The relative effectiveness of the environments may be comparedwith modeling outputs generated by the VAM on the visualrepresentations. The VAM may be modified according to the comparisonresult.

In some embodiments, environments may be classified into groups. Groupsof environments may be classified by types of location, time of day,etc. For example, fast-food restaurants in suburban area are in a groupand fast-food restaurants in rural area are in another group. As anotherexample, a store or stores in a franchise chain in the morning is agroup, the store or stores in the franchise chain in the afternoon is agroup, and the store or stores in the franchise chain in the evening isanother group. In some cases, a natural environment may be a group byitself, or a natural environment at a particular condition may be agroup by itself. A group of environments may be associated with aparticular VAM that may be different from a VAM associated with anothergroup of environments. In some cases, the particular VAM may beassociated with the group of environments and it may be stored in a datarepository. Optionally, the particular VAM may be associated with theparticular human behavior in the data repository.

In some cases, the visual attention improvement system may receivevisual representations of at least a portion of one or more environmentsin a group of environments. A VAM may be applied on the visualrepresentations and generate modeling outputs. In some cases, modelingoutputs may be selected from a data repository, which stores modelingoutputs generated by applying the VAM on the visual representations.Data indicative of a particular human behavior may be collected ormeasured from the one or more environments, while the particular humanbehavior may be related to visual attention allocation. The relativeeffectiveness of the environments on influencing the particular humanbehavior may be assessed based on the data. The visual attention modelmay be modified according to the comparison of the assessed relativeeffectiveness and the outputs generated by the visual attention model.In some cases, the modified visual attention model may be associatedwith the group of environments. In some other cases, the modified visualattention model may be associated with the particular human behavior.

In one particular embodiment, one or more environments with similar VAMoutputs may be selected, while the one or more environments differ on avisual dimension. The relative effectiveness of the one or moreenvironments is assessed based on collected data indicative of aspecified behavior that is related to visual attention allocation. Anaccuracy indicator is determined by comparing the relative effectivenesswith the VAM outputs. In some cases, a weight factor related to thevisual dimension may be modified according to the accuracy indicator. Insome cases, one or more environments selected or generated have similarsemantic elements, for example, displays in the environments havesimilar textual messages.

In some embodiments, the visual attention model improvement system maybe used to improve visual attention models used in an environment havinga sign. A sign may provide identification, warning, direction, or otherinformation. For example, a sign may be a traffic sign, a billboard, ora digital sign. A sign may have fixed content or changeable content.Here, an environment is a sign with its surrounding area. In some cases,visual representation of at least a portion of the environment may bedigital photographs taken from the environment or video recording of theenvironment.

In one particular embodiment, the visual attention model improvementsystem may be used to improve visual attention models for digitalsignage systems. Here, an environment is a digital signage display withits surrounding area. A digital signage display may be a LCD display, aplasma TV, or other kinds of display. In some cases, a visualrepresentation of at least a portion of an environment in a digitalsignage system may be generated by inserting the content on the signagedisplay into an image of the surrounding area. In some other cases, avisual representation may be generated by a digital photograph. Asanother example, a visual representation may include several digitalimages of the surrounding area. In some cases, a visual representationmay be selected from a data repository that, as an example, storespieces of content presented on signage displays.

In some cases, a hybrid VAM may be used, which combines bottom-upfeatures with top-down influences. In some cases, an environment isexpected to influence people making a purchase, or purchasing aparticular product. In some cases, point-of-sale data from theenvironment or the point-of-sale data for a particular product from theenvironment may be collected as indications of people making thepurchase or purchasing the product. The sales data is usually obtainedin a regular business process. In some cases, the sales data when theselected piece of content is displayed is selected from a datarepository. For example, two environments are selected and the signagedisplays within the environments present an advertisement of a product.The sales data for the product from the two environments are collected.The sales data may be compared with the modeling output generated by theVAM. The modeling output may be generated by applying the VAM to thevisual representations of the two environments. Alternatively, themodeling output may be selected from a data repository. In some cases,the modeling output may be the relative saliency represented by numbers.In some embodiment, the sales data is inconsistent with what the VAMoutput indicates, for example, the product has the same relativesaliency in the two environments and the sales in one environment ishigher than the sales in the other environment, the VAM may be modified.

For example, a store has a digital signage display to presentadvertisements. Two environments may be the same store having thesignage display presenting different content. The visual representationsof at least a portion of two environments, as simplified examples, areshown in FIGS. 13A and 13B. One piece of content has a pair ofsunglasses (as shown in 1310) and the other piece of content has apicture of a lady wearing a pair of sunglasses (as shown in 1320). Ahybrid VAM is used in this example. The relative saliency of sunglassesin environment 1310 generated by the VAM is similar to the relativesaliency in environment 1320. A store visitor's purchasing thesunglasses shown in the advertisement is related to whether the visitorseeing the advertisement. In this example, the sales data collected fromenvironment 1310 (for example, when content 1330 is displayed) is lowerthan the sales data collected from environment 1320 (for example, whencontent 1340 is displayed). Accordingly, the relative effectiveness ofenvironments 1310 and 1320 on influencing visitors' purchases thesunglasses is inconsistent with the modeling outputs generated by theVAM. A parameter related to ‘face’, for instance, may be adjusted oradded in the VAM to improve the VAM accuracy.

In another particular embodiment, the visual attention model improvementsystem may be used to improve visual attention models for webpagedesigns. Here, an environment is a digital environment, and a visualrepresentation of at least a portion of the environment may be, forexample, an image of the webpage, or a set of visual parametersrepresenting the webpage. In some cases, a webpage may be designed topromote viewers' awareness based on a specific piece of content on thewebpage, for example, such as a piece of breaking news. The specificpiece of content may have a hyperlink directed to a designated webpage,for example, such as a full story of a piece breaking news. The numberof visit to the designated webpage directed from the piece of contentmay be used to indicate the viewers' awareness. In some cases, twoenvironments may be designed with a same specific piece of content intwo webpages, where the two webpages are different on a visualdimension. In some other cases, two environments may be two webpageswith two pieces of content that differ from each other on a visualdimension. An accuracy indicator may be generated based on thecomparison of the relative effectiveness of the two webpages with themodeling output generated by the VAM. The VAM may be modified accordingto the accuracy indicator.

FIG. 14A and FIG. 14B illustrate two exemplary digital environments, forexample, two webpages with a same piece of news. The font of the news onwebpage 1410 is smaller than the font of the news on webpage 1420. Theshort message of the news links to a full story of the news at aseparate webpage. The number of visits to the full story directed from awebpage containing the short message may be collected to indicateviewers' awareness, which is related to whether viewer has attended tothe short message on the webpage. A bottom-up VAM may be used in thisexample. Applying the VAM to environment 1410 and 1420, the saliency ofnews message 1430 in environment 1410 is lower than the saliency of newsmessage 1440 in environment 1420. However, the number of visits directedfrom environment 1410 and 1420 are similar, so the effectiveness ofenvironment 1410 and environment 1420, on influencing viewers' awarenessof the news, is similar. The VAM may be modified such that a consistentmodeling output is generated. For example, a weight factor related tofont size for a piece of news in webpage may be lowered.

A first embodiment is a visual attention model improvement systemcomprising a module for receiving visual representations of at least aportion of two environments that differ from each other on a visualdimension; a module for receiving output generated by applying thevisual attention model on the visual representations of the at least aportion of the two environments; a module for assessing the relativeeffectiveness of the two environments on influencing a particular humanbehavior based on data indicative of the particular human behaviorreceived from the two environments, wherein the particular humanbehavior is inferentially related to attention allocation; and aprocessing unit for comparing the assessed relative effectiveness to theoutput generated by the visual attention model.

A second embodiment is the visual attention model improvement system ofthe first embodiment further comprising the processing unit modifyingthe visual attention model when the assessed relative effectiveness isinconsistent with the output generated by the visual attention model.

A third embodiment is the visual attention model improvement system ofthe first embodiment or the second embodiment further comprising amodule for storing parameters of the visual attention model in a datarepository; and a module for associating the parameters of the visualattention model with the particular human behavior in the datarepository.

A fourth embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,wherein the particular human behavior is indirectly inferentiallyrelated to attention allocation.

A fifth embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,wherein the data indicative of the particular human behavior comprisesat least one of point-of-sale data and motion sensor data.

A sixth embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,wherein the output generated by the visual attention model comprises atleast one of a saliency map, saliency numbers, sequence numbers, andprobability of a region being attended within a given time period.

A seventh embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,further comprising a module for selecting two environments with similaroutput generated by the visual attention model on the visualrepresentations of the at least a portion of the two environments.

An eighth embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,wherein the visual dimension is represented in the visual attentionmodel.

A ninth embodiment is the visual attention model improvement system ofthe eighth embodiment, wherein modifying the visual attention modelcomprises modifying a parameter of the visual attention model related tothe visual dimension.

A tenth embodiment is the visual attention model improvement system ofthe ninth embodiment, wherein modifying a parameter of the visualattention model comprises modifying a weight factor for the visualdimension represented in the visual attention model.

An eleventh embodiment is the visual attention model improvement systemof the first embodiment, the second embodiment, or the third embodiment,wherein modifying the visual attention model comprises adding aparameter related to the visual dimension to the visual attention model.

A twelfth embodiment is the visual attention model improvement system ofthe first embodiment, the second embodiment, or the third embodiment,wherein the visual dimension comprises at least one of color, luminance,orientation, font, edges, motion, faces, intensity, distance from fovea,spatial bias, prior-knowledge influence, and task-based influence.

A thirteenth embodiment is the visual attention model improvement systemof the first embodiment, the second embodiment, or the third embodiment,wherein the two environments have similar semantic elements.

A fourteenth embodiment is the visual attention model improvement systemof the first embodiment, the second embodiment, or the third embodiment,wherein the visual attention model is a hybrid visual attention model.

A fifteenth embodiment is a visual attention model improvement systemcomprising a module for receiving visual representation of at least aportion of an environment; a module for receiving output generated byapplying the visual attention model on the visual representation of theat least a portion of the environment; a module for assessing therelative effectiveness of the environment on influencing the particularhuman behavior based on data indicative of a particular human behaviorreceived from the environment, wherein the particular human behavior isinferentially related to attention allocation; and a processing unit forcomparing the assessed relative effectiveness to the output generated bythe visual attention model.

A sixteenth embodiment is the visual attention model improvement systemof the fifteenth embodiment, wherein the processing unit modifies thevisual attention model when the assessed relative effectiveness isinconsistent with the output generated by the visual attention model.

A seventeenth embodiment is the visual attention model improvementsystem of the first embodiment, wherein the environment is anenvironment having a sign.

An eighteenth embodiment is a system to modify a visual attention modelfor a group of environments, the system comprising a module forreceiving visual representations of at least a portion of one or moreenvironments in the group of environments; a module for receiving outputgenerated by applying the visual attention model on the visualrepresentations of the at least a portion of the one or moreenvironments; a module for assessing the relative effectiveness of theone or more environments on influencing the particular human behaviorbased on data indicative of a particular human behavior received fromthe one or more environments, wherein the particular human behavior isinferentially related to attention allocation; and a processing unit formodifying the visual attention model according to the comparison betweenthe assessed relative effectiveness and the output generated by thevisual attention model, associating the modified visual attention modelwith the group of environments, and associating the modified visualattention model with the particular human behavior.

1. A method of evaluating a visual attention model, comprising:receiving visual representations of at least a portion of twoenvironments that differ from each other on a visual dimension;receiving output generated by applying the visual attention model on thevisual representations of the at least a portion of the twoenvironments; assessing the relative effectiveness of the twoenvironments on influencing a particular human behavior based on dataindicative of the particular human behavior received from the twoenvironments, wherein the particular human behavior is inferentiallyrelated to attention allocation; and comparing the assessed relativeeffectiveness to the output generated by the visual attention model. 2.The method of claim 1, further comprising modifying the visual attentionmodel when the assessed relative effectiveness is inconsistent with theoutput generated by the visual attention model.
 3. The method of claim1, wherein the particular human behavior is indirectly inferentiallyrelated to attention allocation.
 4. The method of claim 1, wherein thedata indicative of the particular human behavior comprises at least oneof point-of-sale data and motion sensor data.
 5. The method of claim 1,wherein the output generated by the visual attention model comprises atleast one of a saliency map, relative saliency numbers, sequencenumbers, probability of a region being attended within a given timeperiod, and length of time to which a region is attended.
 6. The methodof claim 1, further comprising selecting two environments with similaroutput generated by the visual attention model on the visualrepresentations of the at least of a portion of two environments. 7.(canceled)
 8. The method of claim 2, wherein the visual dimension isrepresented in the visual attention model, and wherein modifying thevisual attention model comprises modifying a parameter of the visualattention model related to the visual dimension.
 9. The method of claim8, wherein modifying a parameter of the visual attention model comprisesmodifying a weight factor for the visual dimension represented in thevisual attention model. 10-14. (canceled)
 15. A visual attention modelimprovement system, comprising: a module for receiving visualrepresentations of at least a portion of two environments that differfrom each other on a visual dimension; a module for receiving outputgenerated by applying the visual attention model on the visualrepresentations of the at least a portion of the two environments; amodule for assessing the relative effectiveness of the two environmentson influencing a particular human behavior based on data indicative ofthe particular human behavior received from the two environments,wherein the particular human behavior is inferentially related toattention allocation; and a processing unit for comparing the assessedrelative effectiveness to the output generated by the visual attentionmodel.
 16. The system of claim 15, wherein the processing unit modifiesthe visual attention model when the assessed relative effectiveness isinconsistent with the output generated by the visual attention model.17. The system of claim 15, wherein the particular human behavior isindirectly inferentially related to attention allocation. 18-20.(canceled)
 21. The system of claim 16, wherein the visual dimension isrepresented in the visual attention model.
 22. The system of claim 21,wherein modifying the visual attention model comprises modifying aparameter of the visual attention model related to the visual dimension.23. The system of claim 22, wherein modifying a parameter of the visualattention model comprises modifying a weight factor for the visualdimension represented in the visual attention model.
 24. The system ofclaim 16, wherein modifying the visual attention model comprises addinga parameter related to the visual dimension to the visual attentionmodel.
 25. The system of claim 15, wherein the visual dimensioncomprises at least one of color, luminance, orientation, font, edges,motion, faces, intensity, distance from fovea, spatial bias,prior-knowledge influence, and task-based influence. 26-28. (canceled)29. A visual attention model improvement system, comprising: a modulefor receiving visual representation of at least a portion of anenvironment; a module for receiving output generated by applying thevisual attention model on the visual representation of the at least aportion of the environment; a module for assessing the relativeeffectiveness of the environment on influencing the particular humanbehavior based on data indicative of a particular human behaviorreceived from the environment, wherein the particular human behavior isinferentially related to attention allocation; and a processing unit forcomparing the assessed relative effectiveness to the output generated bythe visual attention model.
 30. The system of claim 29, wherein theprocessing unit modifies the visual attention model when the assessedrelative effectiveness is inconsistent with the output generated by thevisual attention model.
 31. The system of claim 29, wherein theenvironment is an environment having a sign.
 32. (canceled)